Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable SVE Support for L2 Metric Computation in FP32 #969

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adarshs1310
Copy link

@adarshs1310 adarshs1310 commented Nov 29, 2024

Description:
This PR introduces SVE (Scalable Vector Extension) enablement for L2 metric computation in FP32. The changes enhance performance for most indexing methods compared to NEON, with observed speed-ups across multiple algorithms.

Changes in This PR:

  • Added SVE optimizations for L2 metric computation in FP32.
  • Updated CMakeLists.txt to include support for -march=armv8-a+sve.
  • Refactored compute kernels to leverage SVE intrinsics.

Benchmark Results:

Performance benchmarks(32 vcpus) were conducted using both NEON and SVE on ARM architecture. Below are the results showcasing execution times (in seconds):

image

Key observations:

  • All of the algorithms exhibit performance gains with SVE when compared with NEON.

Note: SVE support has been made optional, as not all functions have been fully enabled yet. To utilize SVE, please compile with -march=armv8-a+sve.

/kind feature
Fixes #782

@sre-ci-robot sre-ci-robot requested review from foxspy and hhy3 November 29, 2024 10:34
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adarshs1310
To complete the pull request process, please assign zhengbuqian after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Collaborator

Welcome @adarshs1310! It looks like this is your first PR to zilliztech/knowhere 🎉

Copy link

mergify bot commented Nov 29, 2024

@adarshs1310 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@adarshs1310 adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from 1783ca6 to 4186e1a Compare November 29, 2024 11:31
@mergify mergify bot added dco-passed and removed needs-dco labels Nov 29, 2024
@adarshs1310 adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from e304e26 to fe28988 Compare November 29, 2024 12:29
svbool_t pg = svptrue_b32();

while (i < d) {
if (d - i < svcntw())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this if condition is not needed, just pg = svwhilelt_b32(i, d); should be sufficient

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @alexanderguzhva , for the valuable suggestions. During development, we considered this approach as well, and our reason for going with the current approach is as follows:

Using the if condition to update pg only in the last iteration avoids unnecessary updates and reduces the dependency chain introduced by the svwhilelt instruction. This optimization minimizes stalls caused by these dependencies, allowing the processor pipeline to operate more efficiently.

@@ -48,7 +48,7 @@ endif()

if(__AARCH64)
set(UTILS_SRC src/simd/hook.cc src/simd/distances_ref.cc
src/simd/distances_neon.cc)
src/simd/distances_neon.cc src/simd/distances_sve.cc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is not sufficient.
Knowhere is designed as a library that picks function pointers according to CPU capabilities, detected upon the start.
For example, SSE / AVX2 / AVX512 code files have different corresponding compile options

target_compile_options(utils_sse PRIVATE -msse4.2 -mpopcnt)
target_compile_options(utils_avx PRIVATE -mfma -mf16c -mavx2 -mpopcnt)
target_compile_options(utils_avx512 PRIVATE -mfma -mf16c -mavx512f -mavx512dq
-mavx512bw -mpopcnt -mavx512vl)

So, it seems to be logical that the new SVE code should also contain some form of different flags, such as -march=armv8-a+sve for distances_sve.cc. And I don't believe that I see this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your valuable feedback @alexanderguzhva

We considered two approaches: first, to keep Neon as the default until all FP32 functions for SVE have been fully implemented. Based on this, we decided to proceed with this approach.

Your approach is absolutely correct, and we agree that the Neon fallback for non-SVE functions can be effectively handled using hook.cc.

We have incorporated these changes into our solution.

@adarshs1310 adarshs1310 force-pushed the sve-l2-fp32 branch 2 times, most recently from 97b3ee0 to dde2c81 Compare November 30, 2024 07:12
@adarshs1310
Copy link
Author

@alexanderguzhva @foxspy @hhy3

Requesting updates to the CI pipeline to address GCC header file conflicts.

Since Ubuntu 22.04 defaults to GCC-11, there is a risk of incorrect header usage when GCC-12 is installed. To resolve this, the GCC-11 header files should be removed after installing GCC-12.

The necessary changes have been incorporated into the ARM-based Ubuntu 22.04 Docker configuration as part of this PR, ensuring proper SVE compatibility. Kindly review and consider these adjustments for consistent and accurate builds. Thank you!

@adarshs1310
Copy link
Author

/kind feature

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.87%. Comparing base (3c46f4c) to head (09925c7).
Report is 269 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main     #969       +/-   ##
=========================================
+ Coverage      0   73.87%   +73.87%     
=========================================
  Files         0       82       +82     
  Lines         0     6916     +6916     
=========================================
+ Hits          0     5109     +5109     
- Misses        0     1807     +1807     

see 82 files with indirect coverage changes

@alexanderguzhva
Copy link
Collaborator

@adarshs1310 would you please rebase on top of master? it contains a fix for UT. Thanks.

@adarshs1310
Copy link
Author

Sure @alexanderguzhva! The rebase has been done now!

@adarshs1310
Copy link
Author

Hi @alexanderguzhva , @foxspy , @hhy3 ,

Could you please look into the SSE fix when you have a chance? The ARM CI issue has already been handled in our code.

Thank you so much for your support!

@foxspy
Copy link
Collaborator

foxspy commented Dec 9, 2024

Hi @alexanderguzhva , @foxspy , @hhy3 ,

Could you please look into the SSE fix when you have a chance? The ARM CI issue has already been handled in our code.

Thank you so much for your support!

SSE's CI will not block; but ARM's CI fails

@alexanderguzhva
Copy link
Collaborator

alexanderguzhva commented Dec 9, 2024

@adarshs1310 I was able to compile and run knowhere on AWS Graviton 3 using GCC-12. I see the following error during unit tests, which seems to be a minor precision issue. Still, would you be able to find the root of this problem?

-------------------------------------------------------------------------------
Test Brute Force with input ids
-------------------------------------------------------------------------------
/home/ubuntu/zilliz/knowhere_sve/knowhere/tests/ut/test_bruteforce.cc:201
...............................................................................

/home/ubuntu/zilliz/knowhere_sve/knowhere/tests/ut/test_bruteforce.cc:243: FAILED:
  REQUIRE( gt_dis[i] == dis[i] )
with expansion:
  156769.3125f == 156769.32812f

Other unit tests pass.

The compilation is the following (given that you have created a profile for GCC 12 for conan):

mkdir build
cd build
conan install .. --build=missing -o with_diskann=True -o with_ut=True -o with_benchmark=True -s compiler.libcxx=libstdc++11 -c tools.build:cxxflags+=[\"-mcpu=neoverse-512tvb\",\"-march=native\"] -s build_type=Release --profile=gcc12
conan build ..

@adarshs1310 adarshs1310 force-pushed the sve-l2-fp32 branch 3 times, most recently from 2454f3d to 94333f4 Compare December 13, 2024 02:27
CMakeLists.txt Outdated
endif()

endif()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete those lines, compiler know where to find arm_neon.h

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have tried, can build pass without this file change

Copy link
Author

@adarshs1310 adarshs1310 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Presburger ,

Initially, my plan was to keep SVE support optional until all the functions were implemented. For this reason, I provided the -march=armv8-a+sve flag as an optional tag for users to add. However, as per the discussion with @alexanderguzhva , we have now hardcoded -march=armv8-a+sve in the faiss CMake file. Consequently, I agree that we can remove the optional flag from CMake.

This approach works well after removing the optional tag. However, I had intentionally added those lines earlier because I believed the CI setup was incomplete. The reason for this was that not all users rely directly on the Docker environment provided by the CI. Some build the project manually, and if their machine has a different GCC version and they haven’t properly updated their alternatives, it could have caused issues but atleast they will get to know which vection of gcc was used for the header.

While the CI has now been fixed, future developers may still encounter problems in similar situations as and when they build it on local machine. We can consider a better long-term solution to address this later on.

@adarshs1310
Copy link
Author

@Presburger Hey, I noticed the do-not-merge/hold label was added 3 days ago. Could you let me know the reason for the hold and if there’s anything I can address to help move this forward?

@alexanderguzhva @foxspy @cydrain
image

Is this something at my end? because the UT were passing previously with same code base.

@adarshs1310
Copy link
Author

adarshs1310 commented Dec 17, 2024

Rebase has been done

@foxspy @alexanderguzhva @hhy3

#include <cmath>

#include "faiss/impl/platform_macros.h"
#pragma GCC optimize("O3,fast-math,inline")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adarshs1310 Could you please just remove this hacky pragma?
Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @alexanderguzhva I have removed it. Thanks!

Signed-off-by:Adarsh Srivastava <adarsh.srivastava@fujitsu.com>
Signed-off-by: Adarsh Srivastava <Adarsh.Srivastava@fujitsu.com>
@alexanderguzhva
Copy link
Collaborator

lgtm

@adarshs1310
Copy link
Author

Hi @foxspy and @hhy3! Can we Please get an update on this?

Thank you so much!

@Presburger
Copy link
Collaborator

At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers. At this stage, I am not inclined to accept this PR, as it would require us to maintain two sets of ARM binaries when releasing binaries or containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support SVE
6 participants